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Abstract 

We consider the following problem in which a given number of items has to be chosen from a predefined 
set. Each item is described by a vector of attributes and for each attribute there is a desired distribution that 
the selected set should have. We look for a set that fits as much as possible the desired distributions on 
all attributes. Examples of applications include choosing members of a representative committee, where 
candidates are described by attributes such as sex, age and profession, and where we look for a committee 
that for each attribute offers a certain representation, i.e., a single committee that contains a certain number 
of young and old people, certain number of men and women, certain number of people with different 
professions, etc. With a single attribute the problem collapses to the apportionment problem for party- 
list proportional representation systems (in such case the value of the single attribute would be a political 
affiliation of a candidate). We study the properties of the associated subset selection rules, as well as their 
computation complexity. 


1 Introduction 

A research department has to choose k members for a recruiting committee. A selected committee should 
be gender balanced, ideally containing 50% of male and 50% of female. Additionally, a committee should 
represent different research areas in certain proportions; ideally it should contain 55% of researchers special¬ 
izing in area A, 25% of experts in area B, and 20% in area C. Another requirement is that the committee 
should contain 30% junior and 70% senior researchers, and finally, the repartition between local and external 
members should be kept in proportions 30% to 70 %. The pool of possible members is the following; 


Name 

Sex 

Group 

Age 

Affiliation 

Ann 

F 

A 

J 

L 

Bob 

M 

A 

J 

E 

Charlie 

M 

A 

S 

L 

Donna 

F 

B 

S 

E 

Ernest 

M 

A 

s 

L 

George 

M 

A 

s 

E 

Helena 

F 

B 

s 

E 

John 

M 

B 

J 

E 

Kevin 

M 

C 

J 

E 

Laura 

F 

C 

J 

L 


In the given example, if the department wants to select k = 3 members, then it is easy to see that there 
exists no such committee that would ideally satisfy all the criteria. Nevertheless, some committees are better 
than others; intuitively we feel the sex ratio should be either equal to 2;1 or to 1;2, the area ratio should be 
equal to 2;1;0, the age ratio to 1;2, and the affiliation ratio to 1;2. Such relaxed criteria can be achieved by 
selecting Ann, Donna, and George. Now, let us consider the above example for the case when fc = 4. In such 
case, the ideal sex ratio should be equal to 2;2, the research area ratio to 2;1;1, the age ratio to 1;3, and the 
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affiliation ratio to 1:3. It can be proved, however, that for fc = 4 there exists no committee satisfying such 
relaxed criteria. Intuitively, in such case the best committee is either {Ann, Charlie, Donna, George}, with 
two externals instead of three, or (Charles, Donna, George, Kevin}, with males being over-represented. 

In this paper we formalize the intuition given in the above example and define what it means for a com¬ 
mittee to be optimal. When looking for an appropriate definition we follow an axiomatic approach. First, we 
notice that our model generalizes the apportionment problem for proportional representation J^l . The central 
question of the apportionment problem is how to distribute parliament seats between political parties, given 
the numbers of votes casted for each party. Indeed, we can consider our multi-attribute problem, with the 
single attribute being a political affiliation of a candidate, and the desired distributions being the proportions 
of votes casted for different parties. In such case we can see that selecting a committee in our multi-attribute 
proportional representation system boils down to selecting a parliament according to some apportionment 
method. 

There is a variety of apportionment methods studied in the literature (U]. In this paper we do not review 
these methods in detail (we refer the reader to the survey of Balinski and Young yj), but we rather focus 
on a specific set of their properties that have been analyzed, namely non-reversal, exactness and respect of 
quota, population monotonicity, and house monotonicity. We define the analogs of these properties for the 
multi-attribute domain, and analyze our definition of an optimal committee for a multi-attribute domain with 
respect to these properties. 

To emphasize the analogy between our model and the apportionment methods, we should provide some 
discussion on where the desired proportions for attributes come from. Typically, but not always, they come 
from votes. For instance, each voter might give her preferred value for each attribute, and the ideal proportions 
coincide with the observed frequencies. For instance, out of 20 voters, 10 would have voted for a male and 
10 for a female, 13 for a young person and 7 for a senior one, etc. It is worth mentioning that the voters 
might cast approval ballots, that is for each attribute they might define a set of approved values rather than 
pointing out the single most preferred one. On the other hand, sometimes, instead of votes, there are “global” 
preferences on the composition of the committee, expressed directly by the group, imposed by law, or by 
other constraints that should be respected as much as possible independently of voter preferences. 

The multi-attribute case, however, is also substantially different from the single-attribute one. In particu¬ 
lar, multi-attribute proportional representation systems exhibit computational problems that do not appear in 
the single-attribute setting. Indeed, in the second part of our paper we show that finding an optimal committee 
is often NP-hard. However, we show that this challenge can be addressed by designing efficient approxima¬ 
tion and fixed-parameter tractable algorithms. 

After positioning our work with respect to related areas in Section |2] we present our model in Section |3] 
In Sections |4] and |5] we discuss relevant properties of methods for multi-attribute fair representation. In 
Section|6]we show that, although the computational of optimal committees is generally NP-hard, there exist 
good approximation and fixed-parameter tractable algorithms for finding them. In Section |7] we point to 
further research issues. 


2 Related work 


Our model is related to three distinct research areas: 


Voting on multi-attribute domains (see the work of Lang and Xia OOII for a survey). There, the aim is to 
output a single winning combination of attributes (e.g., in multiple referenda, a combination of binary values). 
Our model in case when k = 1 can be viewed as a voting problem on a constrained multi-attribute domain 
(constrained because not all combinations are feasible). 


Multiwinner (or committee) elections. In particular, our model is related to the problem of finding & fully 
proportional representation JallSl]. There, the voters vote directly for candidates and do not consider at¬ 
tributes that characterize them. Thus, in this literature, the term “proportional representation” has a different 
meaning: these methods are ‘representative’ because each voter feels represented by some member of the 
elected committee. The computational aspects of full proportional and its extensions have raised a lot of 
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attention lately ^.IZ. 3 12 1- Our study of the properties of multi-attribute proportional representation is 

close in spirit to the work of Elkind et al. 13, who gives a normative study of multiwinner election rules. 
Budgeted social choice 13 is technically close to committee elections, but it has a different motivation: the 
aim is to make a collective choice about a set of objects to be consumed by the group (perhaps, subject to 
some constraints) rather than about the set of candidates to represent voters. 

Apportionment for party-list representation systems (see the work of Balinski and Young ||2l for a sur¬ 
vey). As we already pointed out, the apportionment methods correspond to the restriction of our model to a 
single attribute (albeit with a different motivation). While voting on multi-attribute domains and multiwinner 
elections have lead to significant research effort in computational social choice, this is less the case for party- 
list representation systems. Ding and Lin studied a game-theoretic model for a party-list proportional 
representation system under specific assumptions, and show that computing the Nash equilibria of the game 
is hard. Also related is the computation of bi-apportionment (assignment of seats to parties within regions), 
investigated in a few recent papers 0221 l23l |l4|] . 

Constrained approval voting (CAP) 13 is probably the closest work to our setting (MAPR). In CAP 
there are also multiple attributes, candidates are represented by tuples of attribute values, there is a target 
composition of the committee and we try to find a committee close to this target. However, there are also 
substantial differences between MAPR and CAP. First, in CAP, the target composition of the committee, 
exogenously defined, consists of a target number of seats/or each combination of attributes (called a cell), 
that is, for each z G Di x ... x Dp, we have a value s(/); while in MAPR we have a smaller input consisting 
of a target number for each value of each attribute. Note that the input in CAP is exponentially large in 
the number of attributes, which makes it infeasible in practice as soon as this number exceeds a few units 
(probably CAP was designed only for very small numbers of attributes, such as 2 or 3). Second, in CAP, 
the selection criterion of an optimal committee is made in two consecutive steps: first a set of admissible 
committees is defined, and the choice between these admissible committees is made by using approval ballots, 
and the chosen committee is the admissible committee maximizing the sum, over all voters, of the number of 
candidates approved (there is no loss function to minimize as in MAPR). A simple translation of CAP into 
an integer linear programming problem is given in iSlil. 


3 The model 

Let X = {Xi, ..., Xp} be a set of p attributes, each with a finite domain Di = {x}, ..., a;/}. We say 
that Xi is binary if \Di\ = 2. We let D = Di x ... x Dp. Let C = {ci,..., Cm} be a set of candidates, 
also referred to as the candidate database. Each candidate Ci is represented as a vector of attribute values 
(Ai(c,),...,Xp(c,)) SdQ 

For each i < p,hy iTi we denote a target distribution Wi = {tt}, ..., tt/) with X]i=i — 1- 
TT = (tti, ..., TTp). Typically, n voters have casted a ballot expressing their preferred value on every attribute 
Xi, and Trf is the fraction of voters who have xj as their preferred value for Xi, but the results presented in 
the paper are independent from where the values irj come from (see the discussion in the Introduction). 

The goal is to select a committe^l of A: G m} candidates (or items) such that the distribution 

of attribute values is as close as possible to tt. Formally, let S'fe(C') denote the set of all subsets of C of 
cardinality k. Given A G Sk(C), the representation vector for A is defined as r{A) = (ri(A),..., rp{A)), 

where ri{A) = (r^ (A)|l < j < qt) for each i = 1,... ,p, and rj (A) = _ 

Definition 1 A committee A G Sk{C) is perfect for tt ifri{A) = iTifor all i. 

*By writing Xj{ci), we slightly abuse notation, that is, we consider Xj both as an attribute name and as a function that maps any 
candidate to an attribute value; this will not lead to any ambiguity. 

^We will stick to the terminology “committee” although the meaning of subsets of candidates has sometimes nothing to do with the 
election of a committee. 
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Thus, a perfect committee matches exactly the target distribution. Clearly, there is no perfect committee if 
for some i,j, irf is not an integer multiplicity of -i. In some of our results we will focus on target distributions 
such that for each i,j the value is an integer. We will refer to such target distributions as to natural 
distributions. 

We define metrics measuring how well a committee fits a target distribution, called loss functions. 

Definition 2 A loss function / maps tt and r to /(tt, r{A)) C M, and satisfies /(tt, r{A)) = 0 if and only if 
TT = r. 

There are a number of loss functions that can be considered. As often, the most classical loss functions 
use LP norms, with the most classical examples of L^, Lf, and L°°. We focus on two representative L'p 
norms, L^, and L°°, but we believe that other choices are also justified and may lead to interesting variants 
of our model. Consequently, we consider the following loss functions; 

• II • 111,max : ||7r,r(A)||i,max = \ri(A)-Trj\. 

• II ■ Umax : IItt, r(A)||i„ax = max^j |7r^ - (A)|. 

Now, we are ready to formally define the central problem addressed in the paper. 

Definition 3 (OptimalRepresentation) Given X, C, tt, k, and a loss function f, find a committee 
A G S'fc(C') minimizing f{Tr,r{A)). 

Example 1 For the example of the Introduction, we have X = {sex, group, age, affiliation}, D = {F, M} x 
[A, B, C} X {J, S} X [L, E}, and Xi(Ann) = F, Aii(Bob) = M etc. {Charlie, Donna, George, Kevin} is 
optimal for || ■ ||i, with ||7r, r(A)||i = 0.5 + 0.1 + 0.1 + 0.1 = 0.8, and for || • ||i,max, with ||7r, r(A)||i_niax = 
0.4, but not for || • Umax- {Ann, Charlie, Donna, George) is optimal for || • Umax, with ||7r, r(A)||max = 
max(0, 0.2, 0.05, 0.2) = 0.2, but not for the other criteria. 

4 The single-attribute case 

In this section we focus on the single-attribute case (p = 1). Without loss of generality, let us assume that 
the single attribute be party affiliation. Further, let us for a moment assume that for each value x{ there are 
at least k candidates with value x\ (this is typically the case in party-list elections). Then finding the optimal 
committee comes down to apportionment problem for party-list elections, where a fractional distribution tti 
has to be “rounded up” to an integer-valued distribution ri such that ’’j = 

There are two main families of apportionment methods: largest remainders and highest average methods 
1^ . We shall not discuss highest average methods here, because they are weakly relevant to our model. For 
largest remainders methods, a quota q is computed as a function of the number of seats k and the number of 
voters n. The number of votes for party i is Ui = n.iii. The most common choice of a quota is the Hare 
quota, defined as the method based on the Hare quota is called the Hamilton methodjj Our aim is to 
generalize the Hamilton method to multiattribute domains. 

Definition 4 (The largest remainder method.) The largest remainder method with quota q is defined as 
follows: 

• for all i, s* = ^ is the ideal number of seats for party i. 

• each party i receives Si = [s*J seats; let = Si — s* (called the remainder). 

^ Other common choices are the Droop quota 1 -|- jzfp, the Hagenbach-Bischoff quota and the Imperiali quota 
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• the remaining k — Si seats are given to the k — Si parties with the highest remainders U. 

Below we show that the largest remainder methods select a distribution {ki,..., kq) minimizing 
maxj=i p(s* — ki), which in the case of Hamilton comes down to minimizing maxj^i p(^ — ki). 
After defining ^ for all i, we obtain the result that explains that our problem, with any of the three 

variants of loss functions, generalizes the Hamilton apportionment method. 

Proposition 1 When p = 1 and assuming there are at least k items for each attribute, optimal subsets for 
II ■ IIi> II ■ 111,max dnd II ■ Umax Coincide, and correspond to the subsets given by the Hamilton apportionment 
method. 

Proof. Note that || • ||i,max and || ■ ||max are equivalent for p = 1. Recall that s* denotes the target number 
of seats for party j. Let A be a committee of size k and let (A) = k (A) be the number of members of 
A that belong to party j. Since \R^{A) ~ sjl = k\r^(A) — tt-’ I, we need to show that the following three 
assertions are equivalent: 

1. A minimizes \R^{A) — s*|. 

2. A minimizes max^ \R^ (A) — sj|- 

3. A is a Hamilton committee. 

We first show 1 3. Assume A is not a Hamilton committee: then there exists an attribute value (party) that 

receives strictly more or strictly less seats than it would receive according to the Hamilton method. Naturally, 
there must also exist an attribute that receives strictly less or strictly more seats, respectively. Formally, this 
means that there are two attribute values (parties), say 1 and 2, such that the target number of seats for parties 
1 and 2 are s* = p + ai and S 2 = q + Oi 2 , with p, q integers and 1 > a 2 > at > 0, and such that either 
R^A) >p+l and R^{A) < q. We have l^^'(^) - tt^'I = Ej/ 1.2 " Sj I + I^H^) - 41 + 

\Rf{A) — S 2 I > 2 ~ ®il + (1 ~ *^ 1 ) + '^ 2 - Consider the committee A' obtained from A by 

giving one less seat to 1 and one more to 2. 

. If i?i(A) > p + 1 then E, WiA) - s*| - E, WiA') - s*| = \R\A) - sjl - \R\A') - sjl + 
\R^A) - s*| - |i?2(A') - s*| > 1 + (1 - a 2 ) - a 2 > 0. 

• If i?2(A) < q then similarly, (A) - s*\ - T,j IR^A') - s*\ > 0. 

• Ifi?^(A) =p+landi?^(A) = g then we have |i?'’(A)-s*| = 2 \R^ {A)-s*\ + (l-ai)+a 2 

andE, \R^iA')-s*\ = |i?nA')-s*| + (l-a 2 )+ai, hence 1^^^')- 

s*| = 2 ( 0:2 - Oi) > 0. 

In all three cases, A does not minimize \R^(A) — s*| and is therefore not an optimal committee for 

II -111,E- 

We now show 2 => 3. Call a party i lucky if R^{A) > s* and unlucky if R^{A) < s*. Then we 
have maxi |i?*(^) ~ s^l = max(0,max{i?*(A) — s*\i lucky},max{s* — R'‘{A)\i unlucky}). Let, without 
loss of generality, 1 be the lucky party with the highest value (if there are several such parties, we take 
arbitrary one of them) — s* and 2 be the unlucky party with the highest value s* — R^{A). Assume 

A is not a Hamilton committee: then 2 had a higher remainder than 1 before 1 got her last seat, that is, 
R^{A) — S 2 > (i?^(A) — 1) — s}. Let A' be the committee A' obtained from A by giving one less seat 
to 1 and one more to 2: then either A' is a Hamilton committee, or it is not, and in this case we repeat the 
operation until we get a Hamilton committee A*. Because max^ |i?^ (A*) — s*| < maxj \R^ {A) — s*|, A is 
not an optimal committee for || ■ Umax- 

It remains to be shown that if A is a Hamilton committee then if is both optimal for || • || i,max and || ■ Umax- 
If there is a unique Hamilton-optimal committee then this follows immediately from 1 3 and 2 => 3. 

Assume there are several Hamilton-optimal committees Ai,..., Ag. Then there are q parties, w.l.o.g.. 
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1,.. .q, with equal remainders a € [0,1), that is, s* = pi + a, s* = Pq + a, and the Hamilton-optimal 
committees differ only in the choice if those of these q parties to give they give an extra seat. We easily check 
that for any two A'of these committees we have ||^||i,inax = ||^'||i,max and ||A||niax = M'llmax- ^ 

Therefore, our model can be seen as a generalization of the Hamilton apportionment method to more than 
attribute. Note that our model can easily extend other largest remainder methods, and our results would be 
easily adapted. Interestingly, when p>2, our three criteria no longer coincide. However, for binary domains, 
II • 111 and II • 111, max coincide, since J2j=i,2 k* (^) “ 1 = 2 maxj=i ,2 \ri{A) - Trf\. 

Proposition 2 

1. For each p > 3 and binary domains, optimal subsets for || • ||i and || • ||max fnay be disjoint, even for 
k = 2. 

2. For each p > 3, optimal subsets for || • ||max ond || • ||i.max can be disjoint. 

3. For each p > 2, if at least one attribute has 4 values, then optimal subsets for || • ||i and || • ||i,max can 
be disjoint. 

4. For p = 2 and binary domains, optimal subsets for || ■ ||i and || ■ ||max niay differ. 

Proof. We prove point 1 for p = 3 (the proof extends easily to p > 3 by adding attributes on which all items, 
and the target, agree). We have four candidates; two {A and B) with attribute vectors {x\, X 2 , x^), and two (C 
and D) with {x\,x\,x\). The target distribution is = 0 and jrf = 1 for i G {1,2,3}. The || • ||max-optimal 
committees are {A, C}, {A, D}, {B, C} and {B, D}. The || • ||i-optimal committee is (C, D}. 

For Point 2: because optimal subsets for || ■ || 1 and || • || i,max coincide for binary domains. Point 1 implies 
that optimal subsets for || ■ ||max and || ■ ||i,max can be disjoint. The counterexample extends easily to non¬ 
binary domains. 

For Point 3: Let there be two attributes Xi with values xf xf, xf, xf and X 2 with values X 2 ,X 2 ', four 
candidates: A with value vector (xl,X 2 ), B with value vector C with value vector (a:i,a: 2 ), and 

D with value vector {xf^x^)', fc = 2; and tt = (0.5,0.5,0, 0) for Xi and (0.9, 0.1) for X 2 . The optimal 
committees for || • ||i are all pairs except {C, D} (with loss 1.8) while the optimal committee for || • ||i,max 
is {C, D} (with loss 0.6). Next, we show that || ■ ||max and || ■ ||i,max can be disjoint. The counterexample 
extends easily to more attributes and more values. 

For Point 4, let k = 2, three candidates A, B and C with value vectors {x\,xf), (a;},^^) and {xi,X 2 )', 
and tt} = 1 , ttJ = 0, tt} = 0, and 7 r| = 1. {A,B}, {A,C} and {B,Cj are all || • ||i-optimal, but only 
{A, C} and {B, C) are || • ||max-optimal. □ 

These negative results come from the constraints imposed by the candidate database, which prevent the 
selection on the different attributes to be done independently. In the example of the proof of point 1, for 
instance, since all items with the value x\ for X 2 have value 0:3 for X 3 , selecting q items with X 2 = x^ 
implies selecting q items with X 3 = x^. However, if the database is sufficiently diverse so that no such 
constraints exist, the optimization can be done separately on each attribute. This is captured by the following 
notion. 

Definition 5 A candidate database C satisfy the Full Supply (FS) property with respect to k if for any x G D 
there are at least k candidates in C associated with value vector x. 

The candidate database of Example [T] does not satisfy FS, even for fc = 1, because there is not a single 
candidate with group C and age S. If we ignore attributes group and affiliation, then we are left with 2 
(resp., 3, 2, 3) candidates with value vector F J (resp. MJ, FS, MS): the reduced database satisfies FS for 
^ S {Ij 2 }. 
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Proposition 3 Let {X, C, k) be an optimal committee selection problem. If C satisfies FS w.r.t. k, then the 
following statements are equivalent: 

• A is an optimal committee for || • ||i 

• A is an optimal committee for || • ||i,max 

• for any attribute Xi, A is a Hamilton committee for the single-attribute problem {{Xi}, k), 

where is the projection of D on {Xi}. 

Moreover, any || • ||i (and || • ||i,maxj optimal committee is optimal for || • ||max- (The converse does not always 
hold.) 

Proof. For each attribute Xi and value x} G Di, let R} be the number of seats with value xl given 
by the Hamilton method for the single-attribute problem ({X^}, k). For all j = 1,..., fc, let 

ti(j) = min{Z I -f ... -f R}~^ < j and R} Rl > j}- Then take as item Cj any item in the 

database with value vector ..., and remove it from the database; the full supply assumption 

guarantees that it will always be possible to find such an item. Let A = {ci,..., Cfe}; it is easy to check that 
A is an optimal committee for || • ||i and for || ■ ||i,max- LI 

To illustrate the constructive proof, consider 2 attributes Xi with 3 values x\,Xi,Xi, and X 2 with 2 values 
x\,xl-, fc = 4; and R\ =2,Rl = 0,i?f = 2, = 3,i?i = 1. Thenfi(l) = fi(2) = 1, fi(3) =fi(4) = 3, 

^ 2 ( 1 ) = ^ 2 ( 2 ) = ^ 2 ( 3 ) = 1, ^ 2 ( 4 ) = 2, which leads to choose ci with value vector (x\,xf), C 2 with vector 
{x\,xf), C 3 with vector (xf, x^), and C 4 with vector (xf, x^). 

5 Properties of multi-attribute proportional representation 

Several properties of apportionment methods have been studied, starting with Balinski and Young (11]. We 
omit their definition in the single-attribute case and directly give their generalizations to our more general 
model. Let A be any optimal committee for some criterion given tt, C and k. We recall that R^ (A) = krj (yl) 
denotes the number of elements of A with the attribute Xi equal to x^. 

• Non-reversal: for any attribute Xi, and attribute values x{, x{ , if then (A) > (A). 

• Exactness and respect of quota: for alH, either = [A:7r^J ox R} = [fc7r^]. 

• Population monotonicity (with respect to Xi): consider tt and p such that (a) > pf (b) for all 

j" j" 

j', j" f j, T±rr = and (c) for all %' i and all j, p{, = ttI . Then there is an optimal committee B 
'^i Pi II 

for p such that rl (A) > rl(B). 

• House monotonicity: let B be an optimal committee for tt, C and k' > k. Then for all i,j, rl (B) > 

rl(A).E 

In the single-attribute case, it is known for long that the Hamilton method satisfies all these properties 
except house monotonicity (this failure of house monotonicity is better known under the name Alabama 
paradox). 

We start by noticing that if a property fails to be satisfied in the single-attribute case, a fortiori it is not 
satisfied in the multi-attribute case. As a consequence, house monotonicity is not satisfied, even under the FS 
assumption. We now consider the other properties. 

'^Some other properties, such as consistency, seem more difficult to generalize to the multi-attribute case. Also, properties that deal 
with strategy proofness issues, such as resistance to party merging or party splitting, are less relevant in our setting than for political 
elections and we omit them. 
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Proposition 4 Under the full supply assumption, non-reversal, exactness and respect of quota, and popula¬ 
tion monotonicity are all satisfied, for any of our loss functions. In the general case, non-reversal, exactness 
and respect of quota are not satisfied. If Xi is a binary variable, and for || • || i, population monotonicity with 
respect to Xi is satisfied; however it is not satisfied in the general case. 


Proof. Under FS, the result easily comes from Proposition |3] and the fact that the property holds in the 
single-attribute case. 

In the general case, we give counterexamples. For exactness and respect of quota, we have two binary 
attributes, and two items a, h with value vectors (xf, X 2 ) and {x\,xl), k — 1 , tt dehned as ttI — 0 , irf = 1, 
TT^ = 1, 7r| = 0. The optimal committee is either {a} or {&}, and does not respect quota even though all 
values kirf are integers. 

For non-reversal we have two binary attributes and six items: a, b, c, each with vector {xl , X2) and d, e, /, 
each with vector (a:f, x|). We have a target distribution tt dehned as follows: ttI = 0.35, irf = 0.65, = 1, 

7r| = 0. We set fc = 3. The optimal committees for || ■ ||i and || ■ ||i,max are {a, 6, c} and all triples made up 
from two items out of {a, b, c} and one out of {d, e, /}. The optimal committees for || • Umax are all triples 
made up from two items out of {a, 6, c} and one out of {d, e, /}. In all cases, for all optimal committees A 
we have r\{A) > rl{A) although tt\ < ttJ. 

Now, we prove that population monotonicity holds for binary domains and for || • ||i. Consider a binary 
attribute Xi, with Di = {x'i,x\}. 

Assume that > 7r° (and and that for all i' f i ws have = tt^' . Let A be an optimal 

committee for tt and, for the sake of contradiction, assume that for all optimal committees B for p we 
have r^{B) < r9(A). Let B be such a committee. The proof is a case by case study, with six cases 
to be considered: (Cl) r^{B) < 7r° < p° < r^{A)\ (C2) 7r° < r^{B) < p° 
p° < r°(B) < r°(A); (C4) r^^{B) < 7r° < r°{A) < p\ 
r^B) < r°{A) < tt° < p°. 


, 0 . 


< rO(A); (C3) < 

(C5) TT° < r°{B) < r°{A) < pf, and (C6) 


Case 1 : r^{B) < 7r° < p° < r^{A). In this case we have r]{B) > tt\ > p\ > rj^A) and the 
following holds: 

MB) - ^lli = E.v* S, MiB) - ttII + (ttO - + {r}{B) - tt}) (1) 

= E, KiB) -Pi,\ + (p° - + {r}{B) - pj) 

+TT^ - tt\ - p\P p\ ( 2 ) 

= \\r{B)-p\U +2(^0-p°) (3) 

<||r(A)-p||i +2(7rO-pO) (4) 

= E.v* E, MM) -M\ + MM) - P°) + {pj - r}{A)) + - P°) (5) 

= E.-^.E, IMM) -Pi\ + MM) - tt^) + M - ri{A)) 

+M - M -P? + p\ + 2(7r° - p°) (6) 

= ||r(A)-Trill +4(^°-p°) (7) 

< ||r(A) - 7r||i (8) 

(4) comes from the fact that A is not optimal for p. Since, there is one strong inequality in the sequence, 
we imply that A is not optimal for tt, a contradiction. 


Case 2: tt^ < r°(B) < p° < r°{A). 
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MB) - ^Iii = E, M(b) -M\ + M{b) - M + M - rl(B)) 

= Ej MiB) -M\ + M - rnB)) + MiB) - pj) 

+2rO(B) - TfO - p° - 2r}{B) + -k] + p] 

= \\riB)-p\U +4rO(B)-27rO-2pO 

< MM - pill + MM) - 2tt° - 2p0 

= E.-^. E, MM) -pI,\ + MM) - p?) + ip] - MM) + M{b) - 2^ - 2p° 
= E.<^. E, MAM - p^-l + MM) - M) + (M - AM)) 

+7’'° - P° - M + p} + MiB) - 2M - 2p° 

= ||r(yl) - 7r||i + MM) - 4p° 

< \\rM) - 7rj|i 
Again we obtain a contradiction. 

Case 3: M < P? ^ AM) < AM)- 

AM) - ^lli = E.-^. E, M'M) -M\ + (AM) - A) + (A - AM)) 


= E^'^^ Ej\A' 

M) 

-A\ + AKb)- 

A) + (A ■ 

-AM)) 

= E^'^^ Ej\A' 

M) 

-pA + iAM)- 

p°) + (p^ 

-AM)) 

-A + p? ■ 

yA 

-pI 



= Ik(-B) - pill 

- M + 2p° 



<IKA)-HIi. 

-27rO + 2pO 



= E^'^^ Ej\A' 

(^) 

-pI,\ + MM)- 

pA + ip} - 

- AM)) - 2.A + 2p° 

= Y' Y' \A 

M) 

-A\ + {AM)- 

A) + (A - 

-AM)) 

+A - p? ■ 

-A 

+ /ii-2^0 + 2p0 




= l|7’(^) -77||l 

• Case 4: AM) — AM) ^ Pi- 

AM) - ^lli = E.<^. E, \AM) - A\ + (A - AM)) + (AM) - A) 

= E, \AM) -pA + (p? - AM)) + (AM) - p\) 

A - pM a + Pi 

= ||r(B)-H|i + 2 ^°- 2 p 0 

< AM) - p\\i + 2M - 2p0 

= E^'^^ Ej \AM) - pi\ + (p° - AM)) + (AM) - pI) + 2.A - 2-p° 

= E,^^ Ej\A(M - p^-l + (AM) - A) + (A - AM)) 

-2AM) + MM) + A +P°- A - pI + 27r° - 2/?° 

= ||r(A) — 7r||i — 4r°(A) + 47r° 

< jjr(A) - 7rj|i 

. Cases: A <AM)<AM)<pI 

AM) - ^lli = E.-^. E, \AM) -A\ + (AM) - A) + (A - AM)) 

= Ee^r Ej \A'M) - A'\ + ip^ - AM)) + (AM) - pA 

+MM) - MM) - A - pM a + p} 

= ||r(B)-H|i +4rO(B)-27rO-2pO 

< - pill + MM) - 2-A - 2p? 

= E.<^. E, lAM) - p^,| + (p° - AM)) + (AM) - pA + MM) - M - M 
= Ee^r Ej\A-M) - p-/| + (^“(A) - 7r°) + M - AM)) 

+MM) - 21-*° (A) + 2E(A) + A + p? - A - pI - M - 2-p° 

= ||r(A) - 7r||i + 4r°(i3) - 4r°(A) 

< jjr(A) - 7rj|i 

• Case 6: AM) < AM) ^ 7r° < /9°. 
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MB) 


= v M 

,(B)- 

M\ + Ml 

-MB)) 

+ 

[r] 

fB)- 

-M) 

= H 

,(B)- 

p’A + ip'i 

-MB)) 

+ 


(B)- 

- p}) 



Pi 






= MB)-p\\i 

+ 2^0 

-2p° 






< MM-p\\y 

+ 271° 

-2p° 






= \4 

,(A1)- 

Pl>\ + {P^ 

-MM) 

+ 


(^)- 

-p}) +271° 

— y" V \r^ 

L^i' 7 \ i' 

,(A1)- 

pI'\ + M 

-MM) 

+ 


iM- 

-M) 

+ p? 

+ ’’■} - 

p\ + 271° 

-2p° 






r{A) — 7r||i 


2p° 


Finally, we give an example showing that population monotonicity does not hold in the general case for 11 • 11 1 . 
First, we describe the set of attributes. We have one distinguished attribute Xi with 5 possible values x}, 
and Xi and 64 groups of binary attributes, indexed with the pairs of integers i,j G {1,2,3,4}. 
These groups of attributes are denoted as X(i^ 2 )> ■ ■ ■ ■ ■ ■ ^(8,8)- Each group contains 

some large number A of indistinguishable attributes, each having the same set of possible values x^j- 
We have 16 alternatives Ai, A 2 ,. ■., As, and Bi,B 2 , ■ ■ ■ Bs, and our goal is to select a subset of fc = 8 of 
them. 

We start with describing these alternatives on binary attributes: each alternative Ai has the value xl 
on all attributes and the value X 2 on all the remaining ones; each alternative Bi has the value on 

all attributes and the value on all the remaining ones. For the binary attributes we set the target 

probabilities to = 1/8 and 7r| = 7/8. Due to this construction, we see that the only two subsets that 
perfectly agree with target distributions on each of binary attributes are A = {Ai,A 2 ,..., Ag} and B = 
{Bi,B 2 , ■ ■ ■ ,Bs}. Indeed, every subset S including and Bj, would have r{S) >1/4 at least for one group 
of attributes j). Since A is large, we infer that, independently what happens on the distinguished attribute 
Xl, the only possible winning committee is either A = {Ai,A 2 , ..., Ag} or B — {Bi, B 2 , ■ ■ ■, Bs}- 

Next, let us describe what happens on the attribute Xi. The vector Mi{A)) is equal to {rl{A)) = 
(1/2, 0,1/2, 0,0). For the committee B, we have (r}(B)) = (1/4,1/4,1/4,1/8,1/8), and the vector of 
target distributions for Xi is equal tti = (0, 0,3/8 + e, 5/8 — e,0). We can see that ||r(A) — 7r||i = 
l/2 + l/8-e + 5/8-e= 1.25-2e. Since, ||r(B)-7r||i = 1/4 +1/4 +l/8 + e + 4/8 - e +1/8 = 1.25, we 
get that A is a winning committee. However, if we modify the target fractions so that pi = (1/4,0, 9/32 + 
£1,15/32 - £2,0), we will get ||r(A) - p||i = 1/4 + 7/32 - £1 + 15/32 - £2 = 30/32 - £1 - £2 and 
||r(B) — p||i = 1/4 + 1/32 + £1 + 11/32 — £2 + 1/8 = 24/32 + £1 — £ 2 , thus, B is winning according to 
p. However, B has lower representation of xl than A, and p was obtained from tt, by increasing the fraction 
of 7r{. This completes the proof. 

□ 


Other properties, specific to multi-attribute proportional representation, could also be considered, for 
instance by adapting properties studied by Elkind et al. M- One such property is candidate monotonicity 
(if we add more candidates to the database, the new committee must be at least as good as the old one). We 
leave this for further research. 


6 Computing Optimal Committees 


In this section we now investigate the computation complexity of optimal committees. We start with observ¬ 
ing that the problem of deciding whether there is a perfect committee for a given instance is NP-complete. 


Proposition 5 Given set of attributes X, a set of candidates C, a vector of target distributions tt, an integer 
k, deciding whether there is a perfect committee is HP-complete. 


Proof Membership is straightforward. Hardness follows by reduction from the NP-complete problem 
EXACT COVER WITH 3-SETS, or x3c [[Ijl- Let I = {X,S) with X = {xi,... ,X3k} and S = (Bi,..., 
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with l^il =3 for each i. / is a positive instance of x3c iff there is a collection S' Q S with |iS'| = k and 
U{S'|S' S 5'} = X. Define the following instance of PERFECT COMMITTEE: 1st Xi,..., X^k be ik binary 
attributes, and let C consist of m candidates ci,... ,Cm with Xi{cj) = 1 if Ti G Sj and Xi(cj) = 0 if 
Xi ^ Sj. Finally, for each i, 71^(0) = and 77^(1) = i. We want a committee of size k. A = {ci ^,..., Ci^,} 
is perfect for tt if for each Xi, there is exactly one j G {1,... ,k} such that Xi(ci ^) = which is equivalent 
to saying that for each Xi, there is exactly one Sj G {Si^,, Si ^.} such that Xi G Sj. Thus, there is a perfect 
committee for tt and C if and only if / is a positive instance. □ 

This simple result implies that the decision problem associated with finding an optimal committee (is 
there a committee whose loss is less than 0?) is NP-hard for all loss functions. However, if the number of 
attributes p is fixed, the problem is solvable in polynomial time. 

Proposition 6 Let p be a constant integer. Given set of p attributes X, a set of candidates C, a vector of 
target distributions tt, an integer k, deciding whether there is a perfect committee is solvable in polynomial 
time. 

Proof. Let q = max^ qi. Each candidate can be viewed as a vector of values indexed with the attributes; 
there are qP such possible vectors. Since the size of the input is at least q, the number of distinct candidates 
is bounded by the polynomial function of the size of the input. The rest of the proof is the same as the proof 
of Theorem|4] □ 


6.1 Approximating optimal committees 

A natural approach to alleviate the NP-hardness of the problem is to analyze whether it can be well approx¬ 
imated. Before proceeding to presentation of our approximation algorithms, the core technical contribution 
of this paper, we define the notion of approximability used in our analysis. 

Definition 6 An algorithm A is an a-additive-approximation algorithm for OptimalRepresentation if 
for each instance I of OptimalRepresentation it holds that |/(7r,r(A)) — f{'K,r{A*))\ < a, where A 
is the committee returned by A for I, and A* an optimal committee. 

It is easy to observe that for binary domains it holds that ||7r, r(A)||i = 2||7r, ■r(A)||i max- This implies 
that for binary domains, an a-additive-approximation algorithm for || ■ ||i is an ^-additive-approximation 
algorithm for || • ||i,max- 

In this paper we mostly present computational results for binary domains. However, this assumption is not 
as restrictive as it may seem—every instance of the OptimalRepresentation problem can be transformed 
to a new instance with binary domains in the following way: 

• Anew — {Aij I 7 = 1, . . . ,p, J = 1, . . . , 

• Cnew — I / — 1, . . . , Ulf 

• Tnew = (-Tij \ 1 < i < pA < j < | A |), where for alH = 1,... ,m,j = 1,... ,pandj = 1,..., |A|, 

and TrX = 1 - tt^. 

The following lemma shows how to obtain approximation guarantees for arbitrary domains having guar¬ 
antees for the problem transformed to binary domains. 

Lemma 1 For a given committee A and target distribution tt, let Anew <^nd TTnew denote the committee and 
target distributions obtained as above. The following holds: 

f • ||Tnew5 t(A new )||i=2||7r,r(A)||i. 
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2. 1 < ll^-;;w.^(-4new)||l,max < ^ I ^ | 

— ||7r,r(A)||i,xnax — MM 

3. max(7rnew,'r(A„ew)) = max(7r,r(A)). 

Proof. We prove the first equality—the proof for the other two is similar. 

\{c G A : Xi{c) = xl}\ _ j 
k 


= E 


|{c € -Anew ■ — 1}| 1 

- k - 


Me 




\{c G Anew : A^i,j{c) — 1}| i 
k 


|{c G Anew : A^i,j{c) — 0}| 


* J 


= E \ = ilknew,7'(Anew)||l- 


^£{ 0 , 1 } 


□ 


Lemma [T] has interesting implications—first shows that the transformed instance has the has the same 
perfect committees as the original instance; then it shows how to obtain additive approximation guarantees for 
arbitrary domains having guarantees for the problem restricted to binary domains, for different loss functions. 


6.2 Approximation algorithms 

In this section we show an approximation algorithm for the OptimalRepresentation problem. The 
algorithm is given in Figure [T] and is parameterized by an integer value i. It starts with a random collection of 
k samples and, in each step, it looks whether it is possible to replace some i items from the current solution 
with some other i items to obtain a better solution. The algorithm continues until it cannot find any pair of 
sets of i items that improves the current solution. As we show now, the approximation guarantees depend on 
the value of the parameter £. 


Parameters: 

TT = (tti, ..., TTp)—input target distributions. 

£—the parameter of the algorithm. 

A ^ k random items from C\ 

while there exist Ce C C and C A such that \Ci,\ < I, \Ai\ < I, and /(tt, r{A)) > /(tt, r((A \ Ai) U Ce)) 

do 

A <— (A \ Ai) U Ce', 

return A; 


Figure 1: Local search approximation algorithm. 


Theorem 1 For binary domains natural distributions, and for the || • || i loss function, the local search algo¬ 
rithm defined on Figure\i\with i = 1 is a \X\-additive-approximation algorithm for OptimalRepreSEN- 
TATION. 

Proof Let A* denote an optimal solution for a given instance I of the problem of finding a perfect committee. 
Let A G Sk{C) denote the set returned by the local search algorithm from Figure[T] From the condition in the 
“while” loop, we know that there exist no c € C and a G A such that ||7r, r(A)||i > ||7r, r((A\{a}) U {c})||i. 
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Now, let Xex ^ ^ denote the set of all attributes for which A achieves exact match with tt, that is, such that 
for each Xi € Xg^, we have that rl(A) = tt} and rf{A) = nf. 

Let us consider the procedure consisting in taking the items from A \ A* and, one by one, replace them 
with arbitrary items from A* \ A. This procedure, in |A \ A*| steps, transforms A into an optimal solution 
A*. We now estimate the total gain g induced by this procedure. For each item a € A \ A*, by a' G A* \ A 
we denote the item which was taken to replace a in the procedure. For each attribute Xi G X we define the 
gain Pi (a, a') of replacing a by a' as: 

gi{a,a)= ^ (A) - tt^ | - |r^ (A \ {a} U {a'}) - | j . 

We now extend this definition to sets of k candidates: 

g.{B,B')= {\ri{A)-ni\-\ri{{A\B)uB')- 7 vi\). 

If Xi G Xgx, then ri{A) = tt^, and so the replacement cannot improve the quality of the solution relatively 
to Xi, hence 


gi{A\A*,A*\A)<0. (1) 

Note that gi{a, a') G {—|, 0, |}. Moreover, for each attribute Xi ^ Xex there are two possible cases: 

1. rf (A) > nf and each exchange of candidate that results in a negative gain increases rj (A). 

2. rl (A) < Trf and each exchange that results in a negative gain decreases (A). 

Intuitively, 1. and 2. mean that for attributes outside of Xg^, the negative gains cumulate. Formally, for each 
X i XgG. 

g,{A\A*,A*\A)< Y 9 ^{a.a'). (2) 

aeA\A* 

From the condition in the “while” loop, we have that for each a G Gl \ Gl*: ^ 0, and so: 

Y X! 9 ^{a,a')<Q. (3) 

i ai^A\A* 

We now give the following sequence of inequalities: 

i 

= Y g^[A\A\A*\A)^ Y 9^iA\A*,A*\A) 

iG^ex i^Xex 

< Y 9^iA\A*,A*\A) <Y E 

i^Xex “i^Xex aGG4.\A4.* 

<-E E 9^ia,a') 

iGXex 


< |2fe 


k--= 2|Xe 


(4) 
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Finally, for each attribute Xi G the loss relative to Xi, i.e., |r° — 7 r° | + \rj — |, is at most 2. Thus, we 

get g < 2{\X\ — |Xex|), which leads to 5 < |X|. □ 


Is the bound |2f| from Theorem [T] a good result? One way to interpret this result is to observe that a 
solution that for half of the attributes gives exact match, and for other half is arbitrarily bad, is an |2f|- 
approximate solution. We do not know whether the bound |2f | is reached, but we now show that a lower 
bound on the error made by the algorithm with f = 1 is ||X|. 

Example 2 Consider 3p binary attributes Xi ,..., X^p, 4£ candidates C = {ai,..., 021 , 61 ,..., tind 
let k = 2L For each i < p, we have: for j < £,Xi(aj) = 1 and Xi{bj) = 1; for j > £,Xi{aj) = 0 
and Xi(bj) = 0. For each i such that p < i < 2p we have: for j < £, Xi{aj) = 1 and Xi{bj) = 0; for 
j > £,Xi(aj) = 0 and Xi(bj) = 1. Fori > 2p we have: for each j, Xi{aj) = 1 and Xi{bj) = 0. Finally, for 
i <2p let 7 r° = 7 r| = and for i > 2p let = 1 — 7 r| = 1. It can be easily checked that B = {61 ,..., & 2 r} 
is a perfect committee. Now, A = {oi,..., 02 ^} is locally optimal. To check this, we consider two cases: 
in the first case, where (r < £ and q <£) or (r > £ and q > £), replacing with bq does not change the 
distance to the target distribution on each of the first p attributes, increases the distance on each of the next p 
attributes and decreases the distance on each of the last p attributes. For the second case, where (r < £ and 
q > £) or (r > £; q < £), the line of reasoning is similar. Finally, || 7 r, r(A)||i = 2p = ||2f|. 

A better approximation bound can be obtained with £ = 2: 

Lemma 2 Consider n buckets Xi ,..., such that in the i-th bucket Xi there are Xi white balls and yi 
black balls. Let A denote the number of pairs of balls such that both balls in the pair belong to the same 
bucket and are of different color. Let us consider the procedure in which one iteratively selects a bucket and 
takes out two balls with different colors from the selected bucket. The procedure ends after B steps, when 
no further steps are possible (in each bucket, either there are no balls anymore, or all balls have the same 

o2 

color). It holds that A > ^. 

Proof. Without loss of generality let us assume that for each i\ Xi < yi. Thus, B = J2i^i ^^d 

A = J2i ^iVi ^ Thi inequality xf > -— '' follows from Jensen’s inequality applied to the 

quadratic function. □ 


Lemma 3 Let Xi,yi, Ai, 1 < i < n, be real values satisfying the following constraints: 

1 . Xi > each 1 <i <n, 

2 . Ai > Ai-i — 2xi-i, for each 2 < i < n, 

Vi > 2n-2(?-l)-l ’ 1 <i<n. 

Then: 


I All Inn 


Proof. We can view the set of above inequalities 1, 2, 3 as a linear program with (3n — 1) variables (all Xi 
and yi for 1 < i < n and Ai for 2 < i < g; we treat Ai as a constant) and (3n — 1) constraints. Thus, we 
know that yi achieves the minimum when each from the above constraints is satisfied with equality. 

We show by induction that the values Xi = ^ and Ai = Ai constitute the solution to the set 

of equalities that is derived by taking constraints 1, and 2, and treating them as equalities. We can show that 
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by induction; It is easy to see that the base step, for i = 1, holds: 


Xi = 


^1 _ I All 

2 n — 2{i — 1) 2 n 


, 2n-2(1-1 , 

Ai > -i- !-Ai. 

^ - 2n 


Let us assume that from the equalities 1 and 2 taken for i < j, it follows that Xi = ^ and Ai = 2ra 

for i < j. We will show that from equalities 1 and 2 for i = j it follows that Xj = ^ and Aj = • 


_ 

- 2n- 2(j - 1) 
Aj = Aj—\ — 2xj—\ 


1 2n-2(j-l) ^^ 

2 n — 2{j — 1) 2 n ^ 2n ’ 

2n-2 ((j-l)-l) ^^ JAil 2 n-2 (j-l) 

2 n ^ 2n 2n 


Ai. 


From constraint 3, treated as equality, we get: 


_ ^ _ 

2 n-2(i-l)-l 


l^il 

2 n{2n — 2{i — 1) — 1) 


Thus, we infer that 2/* minimized when yi = 2rt(2rt-^(t-i)-i) • recall that denotes the n-th 
harmonic number (Hn = + 1) < Hn < 1 + ln(n). As a result we get: 


n . n - 4 n . 

- 2n^ {2n - 2{i - 1) - 1) “ 2n ^ 2n - 2 (i - 1) 

i—1 i—1 ^ \ / 

= = — . 

4n (n — i + 1)) An An 

i—1 ' 


(5) 

(6) 
□ 


Theorem 2 For binary domains (\Di\ = 2, for each 1 < * < pj, natural distributions, and for || ■ ||i 
loss function, the local search algorithm from Figure\I\with ^ = 2 fl 2 Skl 2 )-i (l^l + -additive- 
approximation algorithm for OptimalRepreSENTATION. 

Proof In this proof we use similar idea to the proof of Theorem[T] but the proof is technically more involved. 
As before, by A* and A we denote the optimal solution and the solution returned by the local search algorithm, 
respectively. Similarly to the previous proof, by Xex C 2f we denote the set of all attributes for which A 
achieves exact match with tt, i.e., 

X,^ = {X,€X ■.r]{A)=TT}]. 

We also define the set Xaex C X of all attributes for which A achieves almost exact match with tt, i.e.. 


Let qf = ^ and q = [g/J. Let us rename the items from A \ A* so that A \ A* = {oi, 02 ,..., }, 

and the items from A* \ A, so that A* \ A = {a'l, a' 2 , ■ ■ ■, Hereinafter, we follow a convention in 

which the elements from A* \ A are marked with primes. Renaming of the items that we described above, 
allows us to the define the following sequence of pairs (ai, a'^),..., (029^, 02^^) in which each element from 
A \ A* is paired with (assigned to) exactly one element from A* \ A. 

For each pair {oj , a'j) and for each attribute Xi we consider what happens if we replace in A \ A* with 
a'i- One of three scenarios can happen, after such replacement: 


Xaex =\X,&X 


■{A)-- 


< 
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1. The value r^{A) can increase by i (in such case rl{A) decreases by i), which we denote by 

Xi{aj aj) = 1 , 

2. The value r°(A) can decrease by ^ (in such case rl{A) increases by which we denote by 
Xi{aj ^ a'j) = —1, or 

3. The value {A) can remain unchanged (in such case r] (A) also remains unchanged), which we denote 
by Xi{aj ^ a') = 0. 

We follow a procedure which, in q consecutive steps, replaces pairs of items from A \ A* *, with the pairs 
of items from A* \ A. A pair (oj, a_,) is always replaced with (o', a'). In other words, when looking for a 
pair from A* \ A to replace {ai,aj) we follow the assignment rule induced by renaming, as described above. 
The way in which we create pairs within A \ A* for replacement (the way how {ai,aj) is selected in each 
of q consecutive steps) will be described later. After this whole procedure A can differ from A* with at most 
one element, hence, having distance to the optimal distribution at most equal to \X\j. Let us define the 
sequence of sets Ai, A2 ,..., A^ in the following way; we define Ai = A \ A*, and we define A^+i as Aj 
after removing the pair from A \ A* that was used in replacement in the j-th step of our procedure. 

As before, for each B C A \ A* and B' C A* \ A, and for each attribute Xi G X we define the gain 
g,iB,B'): 


9i{B,B')= ^ {\rl{A)-^Tl\-\r^^{{A\B)l^B'). 
i6{L2} 

Similarly as in the proof of Theorem[T] we observe that for Xi ^ Xaex the negative gains cumulate; i.e., 
that for each sequences of disjoint sets Bi, B 2 , ■ ■ ■, Bg and B[, B 2 , ■ ■ ■, B'g such that for every 1 < J < s, 
Bj C A \ A*, Bj C A* \ A, and \Bj \ = \B '^| < 2 we have that; 

g.{\jB,,\jB')<Y,9^iBj,B'^). (7) 

j 3 3 

Why is this the case? If Xi ^ A'aex, then the distance between A and the target distribution on attribute 
Xi is at least equal to 2 ■ |. In other words; |r°(A) — 7r°| > | and \r]{A) — ttH > |. Without loss of 
generality let us assume that r?(A) — 7r° > Since each set Bj and each set S' has at most two elements, 
replacing Bj with S' can change the distance between A and the target distribution, for each attribute, by at 
most p Consequently, if 9 i{Bj, Bj) is negative, then it means that replacing Bj with S' makes the difference 
r° (A) — 7r° even greater. Thus, each such replacement with the negative gain g causes A to move further from 
the target distribution by the value g. Naturally, each replacement with the positive gain g causes A to move 
closer to the target distribution by at most g. Consequently, after the sequence of replacement UjBj o S' 
the distance on the attribute Xi cannot improve by more than J2j 9 iiBj,Bj). 

In contrast to the proof of Theorem[T] we note that here we require that Xi ^ Xaex instead of Xi ^ 2fex— 
the above observation is not valid if Xi G Xaex even if Xi ^ Xexll 

Next, for each Aj, and each attribute Xi G Xgx, we define a set Wj of annihilating pairs as; 

Wj )Xi) — I ( )cix ; Xi), i^cLy , Xi)) . dx G Aj , cLy G Aj, X ^ y, Xi)(ix )} ■ 

^ Consider an example in which = -^ and r)(A) = p Let us consider sets B = {bi, b2}, B' = = {ci,C2},C' = 

{cjjCj} such that: Xi(ci) = Xi(c2) = Xi(b)) = Xi(b2) = d), andXi(c() = X^jcj) = Xi(bi) = Xi(b2) = df. Thus, we 
have that: 

• Replacing B with B' results with rj (A) = ^. 

• Replacing C with C' results with r) (A) = 0 . 

• Replacing BVJC with B' U C results with r) (A) = ^. 

We can repeat this reasoning for r?(A), thus having, gi(B, B') = —p gi{C, C) = 0 and gi{B VJC,B' VJ C') = 0 . 
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>7 

II 

CN 

II 

II 

OJ 

II 

Xi =X5 

X, =X6 

Xi =X7 

Xi{ai O o}) 

1 

1 

1 

1 

0 

0 

-1 

Xi{a2 -O' 02 ) 

-1 

-1 

1 

0 

0 

1 

0 

-^ 1(03 GG a'^) 

0 

-1 

-1 

0 

1 

0 

1 

Xi{(l4, -O' 0 , 4 ) 

-1 

1 

-1 

-1 

1 

0 

-1 


Table 1 ; An example illustrating the concept of anichilating pairs. In this example we have Xex = 
{Xi,X2,X3,X4,X^,Xq,X^} and Ai = {01,02,03,04}. We recall that A'i(oi -O- o') = 1 if replacing 
Oi with o' moves A further from the target distribution in one direction and Xi{ai ■(-)• o') = —1 if replacing 
Oi with o' moves A further from the target distribution in the other direction. Here, we have Wi(Xi) = 
{((oi, Xi), (02, ATi)), ((oi,Xi), (o 4 ,Xi))}, WAX2) = {((oi,X2), (02, ^"2)), ((oi, ^2), (o 3 ,X 2 ))}, 

W^i(-^ 3 ) = (031-^3)), ((«i,-^ 3 ), (a 4 ,-^ 3 ))j ((« 2 ,-^ 3 ), (031-^3)), ((a 2 ,-^ 3 ), (o 4 ,-^ 3 ))}. etc. 

Further, Wi = VFi(Xi) U FFi(X2) U WliXs) U WiiXi) U Wi{X5) U Wi{Xq) U WAXy)- There 
are many choices for the set P, but it must hold that |P| = 6; we give the following ex¬ 

ample: P = {((oi,Xi), (02,^i)), ((oi,X2), (02,X2)), ((oi,X3), (03,X3)), ((02,X3), (04,X3)), 
((oi,X4), (04, X4)), ((oi,X7), (03,^7)) }. 


Intuitively, if {{ax, Xi), (oy, Xi)) G Wj, then both replacing ax with o^ and replacing Oy with a'y move the 
original set A (i.e., the set before any of the replacements) further from the target distribution for the attribute 
Xi, but replacing {ax, ay} with jo^, a'y] does not change the distance of A from the target distribution for 
the attribute Xi. 

For each j, we set Wj = Let us denote by P the number of annihilated pairs of candi¬ 

dates considered in the process of replacing items from A \ A* with items from A* \ A. Formally, P is the 
size of the maximal subset W C Wi composed of disjoint annihilating pairs, i.e., for each i < p, for each ax, 
and for each ay, if {{ax, Xi), {ay,Xi)) G P then there exists no b A o,y such that {{ax, Xi), {b, Xi)) G P 
or {{b, Xi), {ax, Xi)) G P. From Lemma | 2 l after defining each bucket Xi as containing Xi white balls and 
Pi black balls, where Xi (respectively, yi) is the number of candidates aj G Ai with the value Xi {aj GG a' ) 
equal to 1 (respectively, - 1 ), it follows that Wi > The concept of annihilating pairs is explained on 

example in Table [T] 

We are now ready to describe the way in which we select pairs from A \ A* in our procedure. In 
each step j, the pair (aj,i, aj^2) from A \ A* is selected in the following way. For each item a let Sj,i(a) 
be the number of pairs p in Wj such that p = {{a, •), (•, •)) or p = ((•, •), {a, ■)), let Ojp be such that 
Sjp(aj) = maXjjg^^ Sj^i{a), and let Sj^i = Sj^i{aj). Next, for each item b let Sj^2{b) be the number of pairs 
pin Wj such that p = {{aj^i, •),(&,•)) or p = {{b,-), (ojp, •)), let aj_2 be such that Sj_2(&) = maxyg^^. Sj,2(&), 
and let Sj^2 = Sj^2{aj,2)- 

Let us consider the procedure described above on the example from Table [T] The item ai belongs to 8 
pairs in Wi {ai belongs to 2 pairs for attribute Xi, X2, and X3, and to one pair for attributes X4 and X7), 
thus: si,i(ai) = 8. Moreover, 544(02) = 5 , 54,1(03) = 6, and 54,4(04) = 7 . Consequently, oi will be the 
item that will replaced with a[ in the first step: opi = 04 and 5^,4 = 8. Further, 54,2(02) = 2 (there are 
two annihilating pairs including 04 and 02, i.e.,: ((04, Xi), (02, Xi)) and ((04, X2), (02, X2))); similarly: 
51,2(03) = 3 , and 54,2(04) = 3 . Thus, an arbitrary of the two items, 03 and 04, say 03, will be the second 
item that will be replaced with O3 in the first step. In the second step only two items, 02 and 04, are left, so 
both will be replaced with O2 and 04 in the second step. Nevertheless, let us illustrate our definitions also 
in the second step of the replacement procedure. The set A2 consists of two remaining items: 02 and 04. 
WehaveFF2 = {((o2,X2), (o 4 ,X 2 )), ((o2,X3), (o 4 ,X 3 ))}. Naturally, 52,4(02) = 52,1(04) = 52,2(02) = 
52,2(04) = 2 . 

We want now to derive bounds on the values 5^,4 and 5^,2 ■ The following inequalities hold: 

L Sj ,4 > 2 qi^-Zi-i) for each 1 < j < q. 

Wj contains pairs of items belonging to Aj. A4 has 2qf items, and Aj+i is obtained from Aj by 
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^ 2 ) — 0 


^ 2 ) — —^ 

Xi(ai -o- a^) — 1 


Xi{cL2 -o- 0 ^ 2 ) — 1 


TI 7 




Xi(ai o a^]^) — 1 


, V 

W, 

Xi{ai o- Qi) — 1 


a) b) c) 

Figure 2 ; Figure illustrating that for Xi G X^x, 5i({aij ^2}, {o-i, 02}) is greater than {gi{ai,a[) + 
(7^(02,02)) if and only if {{ai, Xi), {a2, Xi)) is an annihilating pair. The hgure presents 3 scenar¬ 
ios: a) {{ai, Xi), {a2, Xi)) is an annihilating pair. Both replacing oi with a[ and replacing 02 with 
02 moves us further from the target distribution for attribute Xi (the target distribution is marked as 
a black dot), thus gi{ai,a[) — — ^ and gi{a2,a'2) = — However these changes annihilate, and 
Pi({ai, 02}, {a'l, 02}) = 0. b) gi{ai,a'i) = —| and 5^(02,03) = —f, but these changes do not an¬ 
nihilate, and thus: pi({ai, 02}, {a'^, 02}) = —c) gi{ai,a'i) = and gi{a2,a2) = 0, if at least 
one change does not move the solution against the target distribution, the changes do not annihilate, and 
02 }, { 01 , 02 }) = g^{al,a[) +g^{a 2 ,a 2 )■ 


removing two items. Consequently, Aj has 2qf — 2{j — 1) items, and thus, Wj contains pairs of 
2qf — 2{j — 1) different items. From the pigeonhole principle it follows that there exists an item that 
belongs to at least 2 q^^(j-i) P^irs- Naturally, we also get the weaker constraint: Sj^i > 2gj.i^(j_i) ■ 

2. \Wj\ > |VFj_i| — 2 sj-i^i for each 2 < j < q. 

Each item in fFj_i belongs to at most Sj-i.i pairs (this follows from the definition of Sj_i 1). Wj 
contains all pairs that kFj_i contained, except for the pairs involving aj_i 1, aj- 2,2 (to obtain Aj, we 
removed these two items from Aj-i). Consequently, Wj is obtained from Wj-i by removing at most 
2sj_i_i pairs of items. 

3- Sj.2 > 2 qj- 2 U-i)-i for each I < j < q. 

In Wj, there are pairs of items involving Oj^i. As we noted before, Wj contains pairs of 2qf — 
2{j — 1) different items. Thus, in Wj, aj^i is paired with at most 2qf — 2{j — 1) — 1 items. From the 
pigeonhole principle it follows that 0^,1 must be paired with some item at least 2 qf- 2 lt-i)-i times. 

From Lemma[3]we get that: 


H Sj.2 > 


i=i 


\Wi\\nq 

4q 


(8) 


Before we proceed further let us make three observations regarding annihilating pairs. First, we note that 
for each Xi G X^x, and each Qx and ay, if the value gi{{ax, Oy}, a'y}) is different from {gi{ax, a'x) + 
gi{ay,a'y)) than it is greater from {gi{ax,a'x,) A gi{ay,a'y)) by |. We also note that %}, {0^0^}) is 

greater than {gi{ax,a'^) + gi{ay, a'y)) if and only if the changes Xi{ax ^ a'^) and Xi{ay •(->■ a'y) annihilate 
(this is illustrated in Figure| 2 |i. Further, we recall that the value Sj 2 counts all attributes for which Oj 1 and 
Oj 2 constitute an annihilating pair. Thus, for each 1 < J < qy- 

9 i{{^jAWj, 2 },{cij^l,Clj 2 })= { 9 i{^j,lWj,l) 9 i{^j, 2 ,Clj 2 )) + Sj^ 2 -^ ( 9 ) 
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4 pairs that annihilated 



-g,{A\A 


Figure 3: Figure illustrating the effect of replacing 10 items for an attribute Xi € Xex- Each replacement 
imposes a negative gain: gi{aj, o') = —| for 1 < j < 10. Thus, X]agA\A* In this example 

four pairs annihilated, and, consequently, gi{A \ A*, A* \A) = — 

Our second observation is similar in spirit to the first one. We note that for each Xi G X^x- 

4 

gi{A \ A*, A* \ A) — 2_^ giifl^ a') = the number of pairs that annihilated for Xi x 

a^A\A* 

The above equality is illustrated in Figure|3] As a consequence, we get that: 

5: [gM\A*,A*\A)- gi(a, a)^ = the number of pairs that annihilated x —. 

Xi^Xex aGA\A* 

We recall that after the replacement procedure A can differ from A* with at most one element, hence, having 
distance to the optimal distribution at most equal to \X\^. Thus: 

\A)i)+g*(aj-2,a' 2 ))) < P ■ ^ + \X\^. (10) 

Our third observation says that: 

y] y] y]g,({a,-i,a,-2},{a;.i,a;-2}) < |X,ex\^ex|. dD 

;tiGXaex\^ex j = l 

Where does Inequality [TT] come from? Let us use the geometric interpretation, like the one from Fig¬ 
ure |3] Let us consider an Xi, Xi G Xaex- For Xi, A lies in a distance of | on the left or on the 
right from the target distribution. Without loss of generality, let us assume it lies on the right. Now, if 

< 0 replacing (0^4,0^,2) with moves the current solution right. 

If 0^,2}) = f’ replacing (0^4,0^,2) with moves the current solution 

by I on left. If «i,2}, “^,2}) = 0’ replacing (a^u, 0^,2) with either does not 

move the solution or moves it by | on left. 

Let us define yi = d*, A* \ A) — Oi.2}) WjAi ^'3,2})- ^ solution moves q times 

to the right, then the total gain — 'Yj=i ®i, 2 }) Wj i) o!j 2}) will be maximized, achieving In 

such case however, the value gi{A \ A*, A* \ A) will be equal to —q^, and thus the value yi will be equal to 
0. After some consideration, the reader will see that the value yi is maximized if the current solution moves 
I times right and | times left, each time by the value of This way, the moves to the right induce the total 
gain of I • |, the moves to the left induce the zero gain, but as a consequence, the current solution for Xi 
does not change {gi{A \ A*, A* \ A) = 0). Thus, for each Xi G A^aexj 2/i is upper bounded by | • ^ < 1, 
which proves Inequality [TT] 
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We can further proceed with the proof by observing that from the condition in the “while” loop we get 
that for each 1 < j < <?: 

i 

> %'. 2 }, {o'- 1 , a' 2 }) 

From Equality |9] 

> (5*(aj.i>«j,i)+5*(aj,2,a'-2))+Sj.2^ + “i. 2 }, {a^, 1 , a^, 2 })- 

i^Xex Z^^ex 

Thus, we get: 

- (5*(aj.i>ai.i)+ff*K.2,aj-2)) - «i.2}, {a'- 1 , a'- 2 })- (12) 

*^-^ex i^Xex 

Next, we give the following sequence of inequalities: 

i 

= Y. 9r{A\A\A*\A)+ Y 9^{A\A\A*\A)+ Y 9^{A\A\A*\A) 

JtiEXaexVXex Xi^Xaex 

From Inequality |7] for all i ^ Xaex, we have gi{A \ A*, A* \ A) < J2aeA\A* Since the set 

A \ A* and Uj=i{®j.i) ®i. 2 } can differ by at most one item (which induces distance to the optimal 
solution), we have that 

^ gM\A*,A*\A)< Y E5*(Ki,«..2},{a',i,a;,2}) + ffl. 

Xi^Xaex Xi^Xaexi = l 

And, as a consequence; 

g< Y 9 ^iA\A*,A*\A)+ Y g^{A\A*,A*\A) 

XieXex XiGXaex\Xex 

X”'' /" r 1 r / / 2|A"| 

+ 2^ 2^5i({oi,i>“f.2}, |aj,i,aj,2l) H-^ 

Xi^Xaexf = l 

< ^ gM\A*.A*\A)+ Y 9^{A\A*,A*\A) 

XiGXex XiGXaex\Xex 

'3 2|Ar| 

+ E E 5i({aj.i, 0 ^. 2 }, {fljM, 0 ^. 2 }) E E ®i.2}) ttj,2}) "I" ^ ■ 

Xi^Xexi = l XiGXaex\Xexf = l 

From Inequality[TT]we get: 

g< Y^ \ ^*, ^* \ ^) + Y^ E ®i.2}, {Oj,!, 0 ^, 2 }) ^-^ + |2faex \ 2fex| ■ 

XiGXex Xi^Xexi = l 

From Inequality [T 2 I 

5 < + |2faex\A:ex| + X! 5* \ \ “ E E (“jd ’ “id) + »* K.2 , o'- 2 )) - ^ ^ Sj.2. 

XiGXex XieXexi = l i 
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From Inequality [8j 


9< 


2|X| 


+ |X,ex \ XexI - • I + E 1 \ A*, A*\A)-f;^ + 5.(a„2, a', 2 )) 


i^Xe: 




From Inequality [TOl 

^ 4|X| , ^ , \W,\\nq , ^4 

^ T \ ^ + ^k- 

As we noted before, from Lemma|2] we have that IFi > , y , . Thus; 

I -^ex I 


9 < 


4|X| 


+ |X,ex \ ^exl + 7 hP - 


P^lnq 

4|Xex|g 


Since g < and since the function is decreasing for x>l: 


9< 


4|X| 


+ |Xaex\^ex| + ^(P-^^^^) 


The function f{P) = P — ^ 2 \x^^\k^ takes its maximum for P = Thus: 


|Xex|fc 


, < iw + |A„, \ x„| +1. \ x„| + 


k 21n(fc/2) 


ln(fc/2)■ 


Since our local-search algorithm for i = 2 also tries to perform local swaps on single items, we can repeat 
the analysis from the proof of Theorem[T] Thus, using Inequality |4] from there, we get that g < 2|Xex|, and 

as a consequence: (i - g < |Xex| - 

For each attribute Xi G X \ Xaex the distance from A and the target distribution is bounded by 2. For 
Xi G ATaex this distance is bounded by | . Thus, we get that g < 2 (| AT | — | Xg^ | — 12faex \ -^ex I) + I I f > and 
so: 


5+ 7- 


1 


1 


2 ln(fc/2) 


, 1 /4|X| , ^ , 2|Xex| 

^^2^- k '’"l + ln(fc/2) 

+ |Xex|- 


ln(A;/2) 


+ (|X| - |Xex| - l^aex \ Xex|) + 


= 1^1 + 


6|X| 


Finally, we get; 


9< 


ln(k/2) 
21 n(k/2) - 1 




Which proves the thesis. □ 

Since a brute-force algorithm can be used to compute an optimal solution for small values of k, Theorem|2] 
implies that for every e > 0 we can achieve an additive approximation of ^(lAij -f e), that is we can guarantee 
that the solution returned by our algorithm will be at least 4 times better than a solution that is arbitrarily 
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bad on each attribute. A natural open question is whether the local search algorithm achieves even better 
approximation guarantees for larger values of i. 

One may argue that the restriction to normal target distributions is a strong one. However, for a given 
vector of target distributions tt, we can easily find a vector tt^v of target normal distributions such that 
IItTjTTtvIIi < 2A Thus, the results from Theorems [T] and |2] can be modified by providing approximation 
ratio worse by an additive value of 2A valid for arbitrary target distributions. Again, since an optimal 
solution can easily be computed for small values of k, we can get arbitrarily close to the approximation 
guarantees given by Theorems [T] and |2] even for non-normal target distributions. 

Below we show a lower bound of ^ for the approximation ratio of the local search algorithm from 
Figure [T] with i = 2 . 

Example 3 Consider 5p binary attributes Xi ,..., X^p, 6 £ and the set of distinct candidates C = 
{oi,..., a^, Oj,..., a^, &i,..., ,..., Cl,..., c^, Cj,..., c^} (in our database there exists a large 

number p of copies of each candidate from C). For each i, we have: 



Xi 

X2 

X3 

X4 

X^ 

Xq 

Xr 

di 

I 

0 

1 

1 

0 

0 

1 


0 

1 

0 

0 

1 

1 

1 

h 

0 

0 

0 

0 

0 

0 

0 


0 

0 

1 

1 

1 

1 

0 

Ci 

I 

I 

1 

1 

0 

0 

0 

< 

1 

I 

0 

0 

1 

1 

0 


We note that for each candidate the value of the attribute X 3 is the same as of X 4 and the value of the 
attribute X^ is the same as of Xq. For i € {1,2,3,4, 5,6} let 7r° = 7r{ = i, and let = 1 — 7r{ = 1. 

Let k = 4p. It can be easily checked that the set consisting of p copies of candidates bi, h'p ci, c' is a 
perfect committee. On the other hand, the set A consisting of 2p copies of candidates Oi and a[ is locally 
optimal. Indeed, replacing candidate Oi or a[ with bi or 6' moves the solution closer to the target distribution 
on X-j, but the further from the target distribution on Xi or X2. The same situation happens if we replace 
candidates Oi or a'i with Ci or c'i. If we replace two a-candidates with the pair consisting of one b-candidate 
(bi or b'i) and one c-candidate (Ci or cl), then such replacement will move the solution closer by ^ to the 
target distribution on X^, but will move the solution further by ^ on two attributes from {X 3, X4, X^, Xq}. 

Finally, ||7 r, 7’(A)||i = 2p = ||Ar|. 


6.3 Parameterized Complexity 


In this section, we study the parameterized complexity of the problem of finding a perfect committee. We 
are specifically interested whether for some natural parameters there exist fixed parameter tractable (FPT) 
algorithms. We recall that the problem is FPT for a parameter P if its each instance I can be solved in time 
0(/(P)-poly(|/|)). 

From the point of view of parameterized complexity, FPT is seen as the class of easy problems. There is 
also a whole hierarchy of hardness classes, FPT C VF[1] C W[2] C • • • (for details, we point the reader to 
appropriate overviews iSIHIIl. 

Obviously, the problem admits an FPT algorithm for the parameter m. Now, we present a negative result 
for parameter k (committee size) and a positive result for the parameter p (number of attributes). 


Theorem 3 The problem of deciding whether there exists a perfect committee is W[l\-hardfor the parameter 
k, even for binary domains. 

Proof. By reduction from the W[l]-complete PerfectCode problem J^]. Let / be an instance of 
PerfectCODE that consists of a graph G = (V, E) and a positive integer k. We ask whether there exists 
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V' V such that each vertex in V is adjacent to exactly one vertex from V' (by convention, a vertex is 
adjacent to itself). From I we construct the following instance /' of the perfect committee problem. For each 

V G V there is a binary attribute Xy and a candidate Cy. For each u,v G V, Xy(cu) = 1 if and only if u and 

V are adjacent in G. We look for a committee of size k. For each u, tt^ = 1 — 7r° = i. It is easy to see that 

perfect codes in I correspond to perfect committees in □ 


Theorem 4 For binary domains, there is an FPT algorithm for the perfect committee problem for parameter 
P- 

Proof Each item can be viewed as a vector of values indexed with the attributes; there are 2^ such possible 
vectors: ui,..., V 2 -p- For each Vi, let Oi denote the number of items that correspond to Vi. Consider the 
following integer linear program, in which each variable bi is the number of candidates corresponding to Vi 
in a perfect committee. 

minimize bi 

i=l 

subject to: 

(a) : bi >0 

(b) :bi<ai 

2 ” 

(c) :'^bi = k 

(d) : bi= ttI 

i-.Vi[j] = l 


1 < i < 2P 
l<i<2P 


1 < j < P 


This linear program has 2^ variables, thus, by the result of Lenstra |115 
FPT time for parameter p. This completes the proof. 


Section 5] it can be solved in 

□ 


Example 4 Let p = 2, k = b, and let the candidate database C consists of 4 candidates with value vector 
vi = (0, 0), 2 with value vector V 2 = (1,0), 2 candidates with value vector = (0,1) and 2 candidates 
with value vector V 4 = (1,1). Let tt = ((0.2,0.8), (0.6,0.4)). The integer linear program is 

minimize &i + 62 + + (>4 

subject to: 


(a) 

6, >0 

1 < i < 4 

(b) 

h < 4; 62 < 2; 63 < 2; 64 < 2 


(c) 

('i + (*2 + ^3 + ^4 = 5 


(d) 

h + bs = 1; 61 + 62 = 3 



and a solution is (&i = 1, &2 = 2, 63 = 0, 64 = 2).' a perfect committee is obtained by taking one candidate 
with value vector (0,0), fwo candidates with value vector (1, 0), and two with value vector (1,1). 

Now, consider the database C' consists of 5 candidates with value vector vi — (0, 0), 2 with value vector 
V 2 = (1)0), 2 candidates with value vector V3 = (0,1) and 1 candidate with value vector V 4 = (1,1). 
Let TT = ((0.2, 0.8), (0.6, 0.4))." then the corresponding constraints are inconsistent and there is no perfect 
committee. 

We conclude this Section by a short discussion. Finding an optimal committee is likely to be difficult if 
the candidate database C is large, and the number of attributes not small. Assume \C\ is large compared to 
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the size of the domain Hti 1^*1’ '^^at each attribute value appears often enough in C and that there is no 
strong correlation between attributes in C: then, the larger \C\, the more likely C satisfies Full Supply, in 
which case finding an optimal committee is easy. The really difficult cases are when ICj is not significantly 
larger than the domain, or when C shows a high correlation between attributes. 

7 Conclusion 

We have defined, and studied, multi-attribute generalizations of a well-known apportionment method (Hamil¬ 
ton), albeit with motivations that go far beyond party-list elections (such as the selection of a common set of 
items). We have shown positive and negative results concerning the properties satisfied by these generaliza¬ 
tions and their computation, but a lot remains to be done. Note that other largest remainder apportionment 
methods can be generalized in a similar way, but it is unclear how largest-average methods can be generalized. 
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